# Linear Regression Quickstart

Already know what's what with linear regression, just need to know how to tackle it in Python? We're here for you! If not, continue on to the next section.

We're going to **ignore the nuance of what we're doing** in this notebook, it's really just for people who need to see the process.

<p class="reading-options">
  <a class="btn" href="/regression/linear-regression-quickstart">
    <i class="fa fa-sm fa-book"></i>
    Read online
  </a>
  <a class="btn" href="/regression/notebooks/Linear Regression Quickstart.ipynb">
    <i class="fa fa-sm fa-download"></i>
    Download notebook
  </a>
  <a class="btn" href="https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/regression/notebooks/Linear Regression Quickstart.ipynb" target="_new">
    <i class="fa fa-sm fa-laptop"></i>
    Interactive version
  </a>
</p>

## Pandas for our data

As is typical, we'll be using [pandas dataframes](https://pandas.pydata.org/) for the data.

In [27]:
import pandas as pd

df = pd.DataFrame([
    { 'sold': 0, 'revenue': 0 },
    { 'sold': 4, 'revenue': 8 },
    { 'sold': 16, 'revenue': 32 },
])
df

Unnamed: 0,sold,revenue
0,0,0
1,4,8
2,16,32


## Performing a regression

The [statsmodels](https://www.statsmodels.org) package is your best friend when it comes to regression. In theory you can do it using other techniques or libraries, but statsmodels is just *so simple*.

For the regression below, I'm using the formula method of describing the regression. If that makes you grumpy, check the [regression reference page](/reference/regression/) for more details.

In [28]:
import statsmodels.formula.api as smf

model = smf.ols("revenue ~ sold", data=df)
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,revenue,R-squared:,1.0
Model:,OLS,Adj. R-squared:,1.0
Method:,Least Squares,F-statistic:,9.502e+30
Date:,"Sun, 08 Dec 2019",Prob (F-statistic):,2.07e-16
Time:,10:14:18,Log-Likelihood:,94.907
No. Observations:,3,AIC:,-185.8
Df Residuals:,1,BIC:,-187.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.665e-15,6.18e-15,-0.431,0.741,-8.12e-14,7.58e-14
sold,2.0000,6.49e-16,3.08e+15,0.000,2.000,2.000

0,1,2,3
Omnibus:,,Durbin-Watson:,1.149
Prob(Omnibus):,,Jarque-Bera (JB):,0.471
Skew:,-0.616,Prob(JB):,0.79
Kurtosis:,1.5,Cond. No.,13.4


For each unit sold, we get 2 revenue. That's about it.

## Multivariable regression

Multivariable regression is easy-peasy. Let's add a couple more columns to our dataset, adding tips to the equation.

In [29]:
import pandas as pd

df = pd.DataFrame([
    { 'sold': 0, 'revenue': 0, 'tips': 0, 'charge_amount': 0 },
    { 'sold': 4, 'revenue': 8, 'tips': 1, 'charge_amount': 9 },
    { 'sold': 16, 'revenue': 32, 'tips': 2, 'charge_amount': 34 },
])
df

Unnamed: 0,sold,revenue,tips,charge_amount
0,0,0,0,0
1,4,8,1,9
2,16,32,2,34


In [30]:
import statsmodels.formula.api as smf

model = smf.ols("charge_amount ~ sold + tips", data=df)
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,charge_amount,R-squared:,1.0
Model:,OLS,Adj. R-squared:,
Method:,Least Squares,F-statistic:,0.0
Date:,"Sun, 08 Dec 2019",Prob (F-statistic):,
Time:,10:14:20,Log-Likelihood:,89.745
No. Observations:,3,AIC:,-173.5
Df Residuals:,0,BIC:,-176.2
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-1.685e-15,inf,-0,,,
sold,2.0000,inf,0,,,
tips,1.0000,inf,0,,,

0,1,2,3
Omnibus:,,Durbin-Watson:,0.922
Prob(Omnibus):,,Jarque-Bera (JB):,0.52
Skew:,-0.691,Prob(JB):,0.771
Kurtosis:,1.5,Cond. No.,44.0


There you go!

If you'd like more details, you can continue on in this section. If you'd just like the how-to-do-an-exact-thing explanations, check out the [regression reference page](/reference/regression/).